Summaries and extracts provide a concise document description more revealing than a document title, yet brief enough to be absorbed in a single glance. The desirability of summaries and extracts is increased by the large quantity of on-line, machine readable, information currently available.
Traditional author-supplied indicative abstracts, when available, fulfill the need for a concise document description. The absence of author-supplied abstracts can be overcome with automatically generated document summaries. Numerous researchers have addressed automatic document summarization. The nominal task of generating a coherent narrative summarizing a document is currently considered too problematic because it encompasses discourse understanding, abstraction, and language generation. A simpler approach avoids the central difficulties of language understanding by defining document summarization as summary by extraction. That is to say, the goal of this approach is to find a subset of sentences of a document that are indicative of document content. Typically, under this approach document sentences are scored and the highest scoring sentences are selected for extraction.
Numerous heuristics have been proposed to score sentences for extracting summarization. Existing evidence suggests that combinations of features yield the best performance. At least one prior extracting summarizer uses multiple features, which are weighted manually by subjective estimation. Manually assigning feature weights to obtain optimal performance is difficult when many features are used.
Prior features used for extracting summarization include frequency-keyword heuristics, location heuristics, and cue words. Frequency-keyword heuristics use common content words as indicators of the main document theme. Location heuristics assume that important sentences lie at the beginning and end of a document, in the first and last sentences of paragraphs, and immediately below section headings. Cue words are words that are likely to accompany indicative or informative summary material; e.g. "In summary."